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MATRIX THEORY MODELS 



GHAITH A. HIARY AND ANDREW M. ODLYZKO 

Abstract. Results of extensive computations of moments of the Riemann 
zeta function on the critical line are presented. Calculated values are compared 
with predictions motivated by random matrix theory. The results can help in 
deciding between those and competing predictions. It is shown that for high 
moments and at large heights, the variability of moment values over adjacent 
intervals is substantial, even when those intervals are long, as long as a block 
containing 10 9 zeros near zero number 10 23 . More than anything else, the 
variability illustrates the limits of what one can learn about the zeta function 
from numerical evidence. 

It is shown the rate of decline of extreme values of the moments is modeled 
relatively well by power laws. Also, some long range correlations in the values 
of the second moment, as well as asymptotic oscillations in the values of the 
shifted fourth moment, are found. 

The computations described here relied on several representations of the 
zeta function. The numerical comparison of their effectiveness that is presented 
is of independent interest, for future large scale computations. 



1. Introduction 

Absolute moments of the Riemann zeta function on the critical line have been the 
subject of intense theoretical investigations by Hardy, Littlewood, Selberg, Titch- 
marsh, and many others. It has long been conjectured that the 2k th moment of 
|C(l/2 + it)\ should grow like Ck(}ogt) k for some constant Ck > 0. Conrey and 
Ghosh [CGlj reformulated this long-standing conjecture in a precise form: for a 
fixed integer k > 0, 

(1-1) \ [ lC(l/2 + it)\ 2k dt ~ (logTf 2 as T -> oo, 

where a(k) is a certain, generally understood, "arithmetic factor," and g(k) is 
an integer. Keating and Snaith [KS suggested there might be relations between 
random matrix theory (RMT) and moments of the zeta function, which led them 
to a precise conjecture for g(k). 

More generally it is expected the 2k th moment of |C(l/2 + it)\ should grow like 
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(1-2) ± £ \((l/2 + lt )f k dt~±£p k [log ^) dt, 

where Pk{x) is a polynomial of degree k 2 with leading coefficient a(k)g(k)/k 2 \, 



which is the same leading coefficient as in ( 1.1 1. Recently, Conrey, Farmer, Keating, 



Rubinstein, and Snaith [CFKRS1 gave a conjectural recipe, compatible with the 
Keating-Snaith prediction for g(k), to compute lower order coefficients of Pfc(x)j^] 
see Section 2. 

The purpose of this article is to present the results of numerical computations 
of integral moments of |C(l/2 + it)\ at relatively large heights. The main goal is to 
enable comparison with RMT, and other more complete predictions for moments 
of the zeta function on the critical line. These predictions are used to get insights 
that go beyond what can be derived even with the Riemann hypothesis. 

RMT has produced a variety of (so far still uproven) conjectures for the zeta 
function. Many of these conjectures have been either inspired by, or reinforced by, 
numerical evidence. In general, the agreement between numerical data and RMT 
predictions has been very good. But it is also known there are some quantities, 
such as the number variance in the distribution of spacings between zeta zeros, that 
depend explicitly on primes. Therefore, for some ranges of the spacing parameter, 
these quantities do not follow RMT predictions exactly. 

Similarly, there are suggestions that the behavior of high moments of the zeta 



function may involve arithmetical factors such as a(k) in conjecture (1.1), and so 
might not follow the RMT predictions completely. We provide numerical data that 
might be used in the future to shed light on this issue. Even today, the statistics 
we present on the variation in the moment values over various intervals might be 
informative in judging the extent to which RMT provides an adequate explanation 
for observed behavior. 

We computed the moments of |C(l/2 + it)\ for a set of 15 billion zeros near 
t = 10 22 , and for sets of one billion zeros each near t = 10 19 and t = 10 15 . We also 
computed the moments near t = 10 7 . These zero samples were used because they 
were the main ones available from prior computations by the second author. 

A relatively large sample size (such as a billion zeros) is useful in our study 
because it helps lessen the effects of sampling errors. This is an important consid- 
eration when investigating the zeta function because some aspects of its asymptotic 
behavior are reached very slowly (while others are reached extremely rapidly). In 
addition, tracking data from different heights (such as 10 15 , 10 19 , and 10 22 ) can 
help us understand how the zeta function approaches its asymptotic behavior. 

Ideally one would like to compute the moments over a complete initial interval: 



(1.3) ijf \((l/2 + l t)\ 2k dt. 

However, that could not be done for large values of T. Since we are interested in 
asymptotic behavior of the zeta function, we calculated numerically 



^See also Diaconu, Goldfeld, and Hoffstein |DGHI . 
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(1.4) 



1 

H 



T+H 



\C(l/2 + it)\ M dt, 



for various choices of k, T, and H by evaluating the integral directly. Section 5 



presents the details. We compare the empirical value of (1.4) against 



(1.5) 



Pk lo 



dt. 



i_ r T +" 

H J T ' " V""° 27r , 
where the coefficients of Pk(x) are as given in [CFKRS1 and |CFKRS2] . 

We find that for high moments of |£(l/2 + zi)|, and near values of T such as 10 22 , 
the variability over adjacent intervals is substantial, even when those intervals are 
long, as long as a block containing 10 9 zeros; see Table 10 Moreover, the variability 
increases rapidly with height, a tendency that is observed even for relatively low 
T pa 10 s (see the comments following Table [7]). 

The leading term prediction ( |1.1[ ) does not match well the moment values at the 
heights where we carried our computations. This is not surprising. As k increases, 
the leading coefficient a(k)g{k)/k 2 \ becomes extremely small (see Table[l]), whereas 
actual moments of the zeta function do grow moderately rapidly. Therefore, one 
cannot hope to get a good approximation for the actual 2k th moment from the 
leading term on its own unless T is large enough to enable the leading power 
(logT) fc to compensate for the small size of the leading coefficient a(k)g(k)/k 2 l. 
In particular, for high moments (relative to T), the contribution of lower order 
terms is likely to dominate. 

TABLE 1. a(k) and g(k)/(k 2 )l as given by \2.l\ and \2.2\ respectively for 
the first few integers k. Numerical values are truncated. 



k 


o(Jfc) 


g(k)/(k*)\ 


1 


1 




1 




2 


6.0792 x 10" 


-l 


8.3333 x 10" 


-2 


3 


4.9321 x 10 


-2 


1.1574 x 10 


-4 


4 


2.1468 x 10" 


-4 


1.1482 x 10- 


-9 


5 


3.1326 x 10" 


-8 


4.5202 x 10" 


17 


6 


1.1415 x 10- 


13 


4.4937 x 1Q- 


27 


7 


8.4291 x 10" 


21 


7.8100 x 10- 


40 


8 


1.0751 x 10- 


29 


1.7402 x 10- 


55 



By contrast, the full moment conjecture of Conrey, Farmer, Keating, Rubinstein, 
and Snaith [CFKRSf] , which gives predictions for lower order terms of Pfc(x), and 
hence takes their contributions into account, is significantly better at matching the 
values of zeta moments that we computed. For example, using a block of 1 9 ze ros 
near T = 10 15 , and for moments as high as the 12 th moment, the ratio of (1.4) to 
(1.5) differs from the expected 1 by less than 4%; see Table [8] 

As remarked earlier, though, our computations do suggest there are large os- 
cillations in the remainder terms of the moment conjectures for high moments. 
For example, we find two almost consecutive intervals near T = 10 22 , containing 
10 9 zeros each, such that the 12 th moment in one interval is about 16 times the 
12 th moment in the other; see Table 10 Such variability is unlikely to come from 



any reasonable conjecture for Pfc(logi/(27r)). The reason is that if zeta moments 
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are modeled by such polynomials, then perturbing H in (1.5) is unlikely to pro- 



duce significant enough changes for the value of the integral to change by large 
multiplicative factors, as seen in our data, since P(logi/27r) varies quite slowly as 
function of t. 

To further highlight the difficulties and limitations to be expected in any numer- 
ical study of high moments, consider Table |12| It is based on computations over 
an interval containing 1.5 million blocks, where each block spans 10 4 zeros near 
T = 10 22 . We find there that more than 50% of the value of the 12 th moment 
comes from just the 5 blocks that contribute the most. Interestingly, the size of the 
largest contribution to the 2k th moment among such blocks is approximated 



rather well (except for the first few n) by a law like 



(1.6) n* largest block contribution w (contribution of the largest block). 

Approximation (1.6) persists for a long range of n; see Figure [lj However, as is 
explained in Section 4, the constant 5 in this empirical relation is almost surely a 
function of the height and number of blocks. 

To help understand the observed variability in the moment data, recall that 
according to a central limit theorem due to Selberg [S] , the real and imaginary parts 
of log £(1/2 + it) I \J (1/2) log log t converge in distribution, over fixed intervals, to 
independent standard Gaussians. In particular, the distribution of |C(l/2 + it)\ 
tends to log-normal. So from the outset, we expect the variability in the values 
of (1.4 1 to increase significantly with T and k. To iron out such variability, we 
would like to take H in integral (1.4) relatively large, ideally H w T, although, 
from probabilistic considerations, we might expect that H s» T 1 / 2 might suffice. 
Since this is impractical with current computational resources when T = 10 22 say, 
we typically took H significantly smaller than T. For example, near both T = 10 15 
and T = 10 22 , the largest we could take H was « 10 s . But then attempts to 



deduce asymptotics for the integral (1.4) from those for the integral (1.3) encounter 



a problem, which we illustrate in the case of the second moment, known to satisfy 



(1.7) 



/ |C(l/2 + it)\ 2 dt = T log ^ + T(2 7 - 1) + Ei(T) . 
Jo 27r 



where Ei(T) — O(T 35 / 10S+<L ); see [12]. (This result does not depend on the assump- 
tion of the Riemann hypothesis.) Based on (1.7), one might suspect 



(1.8) 



H 



T+H 



\((l/2 + lt)\ 2 dt 

T + H, T + H 
log 



H 



2tt 



Ilog~ + (2 7 -l) + (l) 



However, in order for (1.8) to be valid, we need 



(1.9) 



E^T + H) - E 1 {T) 
H 



-o(l) 



Relation (1.9) is certainly true if H > T 1 / 2 say, but not necessarily true if H is 
substantially smaller. Since in our data the largest H we can take is often far below 
T 1 / 2 , the agreement with prediction depends on whether over short intervals the 
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remainder term E\(t) varies slowly enough for something like (1.9) to hold. Of 
course, a similar statement applies to the remainder terms E^.{t) corresponding to 
higher moments. Therefore, in summary, the variability of the moment data is a 
function of the following: 

• is the height T large enough for asymptotics to apply? 

• are the number and size of samples large enough to be representative of a 
log-normal variable? 

• given k, are there significant oscillations in the remainder term Ek{T)l 

Numerical data show some long-range correlations (on the scale of a few thousand 
zeros) for the second moment; see Figure [2j However, we do not find similar 
evidence for such correlations in the case of higher moments. To further examine 
the correlations in the second moment, we numerically investigate the shifted fourth 
moment, where Kosters |Koj proved a kernel law for shifts on the scale of mean 
spacing of zeros. For larger shifts, we observe a departure from this law, and the 
onset of asymptotic oscillations; see Figures [3] and [2j 

In the course of performing our moment computations, various local models of 
the zeta function (i.e. models to numerically approximate |C(l/2 + it)\ over short 
intervals) were analyzed. The results of those analyses are of independent interest, 
for guidance in selection of numerical methods for other calculations that might be 
done in the future with the zeta function. An attractive local model that was tested 
is due to Gonek, Hughes, and Keating CI IK . Another model, which numerically 
works quite well, is to approximate |C(l/2 + it)\ by a suitably normalized polyno- 
mial that vanishes at the zeta zeros near t. Numerical computations suggest this 
approximation converges linearly in the number of zeros used. (We consequently 
found an elementary proof of this.) 

The paper is organized as follows. In Section 2, we provide a further discus- 
sion of some known results and conjectures for the moments of |C(l/2 + it)\. In 
Section 3 we document the datasets used in our study, and calculate the various 
moment predictions for them. Section 4 contains the results of our numerical com- 
putations of the 2 nd to 12 th even moments. In Section 5, we outline the numerical 
methods employed, which include point-wise approximations to zeta (the main one 
being the Odlyzko-Schonhage algorithm), the choice of the integration technique, 
and implementation details. In Section 6, we numerically verify the accuracy of 
some local (short-range) point-wise approximations of zeta that were used in our 
computations. The results of these computations are of independent interest. 

2. Results and conjectures for zeta moments 
Number-theoretic methods were not able to predict, in general, the value of the 



factor g(k), which occurs in asymptotic (1.1). In contrast, a general prediction for 



the arithmetic factor a(k) was established explicitly as 



(2.1) a(k) := I] (1 - Vpf E 4(p m )p- 

p \ \m— 

where dk is the fc-fold divisor function. We remark the size of a(k) is mostly 
determined by the contributions of the first 0(k 2 ) primes. 

Although g(k) is not critical to determining the order of £(1/2 + it), it is im- 
portant in comparing empirical data to conjectures due to the extra precision it 
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provides. A conjecture for the value of g(k) was first provided by Keating and 
Snaith [KSj . Motivated by recent progress in developing (conjectural, numerical, 
and heuristic) connections between RMT and the zeta function, they suggested 
there might be a relation between moments of the characteristic polynomials of 
unitary matrices and moments of the zeta function. This led them to conjecture: 

(2.2) gw : =n jl . 

Specifically, Keating and Snaith considered Zn(A, 9), the characteristic poly- 
nomial (in e l9 ) of an N x N unitary matrix A. They used Selberg's formula to 
calculate the expected moments of \Zn(A, 6)\, where the expectation is taken with 
respect to the normalized Haar measure on the group of N x TV unitary matrices. 
They found for integer values of k > that 

3=0 U '' 



The right side of formula (2.3) is a polynomial in N of degree k 2 , with leading 
coefficient given by the right side of ( |2.2[ ). (The full result of jKSj is more general 
as it can be continued to the half-plane 5i(fc) > —1/2.) 

The absence of the arithmetic factor a(k) from formula ( |2.3| ) hints that the 
moments of zeta might split as the product of two parts, one that is universal 
and is due to RMT fluctuations (on the scale of mean spacing), and another that 
is specific to zeta and corresponds to the contributions of small primes; see |BK] 
and [GHKj . A similar phenomenon appears in some statistics of zeta zeros, such 
as zero correlations and number-variance of zero spacings. 

It appears, however, that RMT is only able to predict the universal part of the 
leading term asymptotic for zeta moments. For example, Ivic [XT] and, separately, 
Conrey [C] , determined explicitly the coefficients of the fully proven fourth moment 
polynomial P^ix). Their results lead to 

|C(l/2 + it)\ A dt ps 0.050660 (fog ~j + 0.496227 ^log ^\ 

(2.4) f T\ 2 ( T 

+ 0.937279 log — + 1.35334 log 



2tt J V. 27T, 

- 0.040924 + E 2 (T)/T, 

where numerical values are obtained via truncation. By comparison, the straight- 
forward RMT-based prediction for the fourth moment is determined by a quantity 
like 

(2.5) a(2) E N \Z N (A, fl)| 4 = a(2) f[ ^^ff - 



where a(2) = 0.607927. . . . If expression (2.5| correctly predicts lower order terms 



for the fourth moment, then it should yield something similar to right side of (2.4 1, 
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but it does not; if we make the identification N — > logT/ (2ir), as suggested by [KSJ, 



then expression (2.5) produces the following polynomial instead: 



0.050660 log 



(2.6) 



T 
2rr~ 



0.405284 j log ^\ + 1.16519 ('log 



1.41849 log 



T 
2r7 



0.607927. 



So, the leading terms in polynomials (2.4) and (2.6) agree, but lower order terms do 



not. This discrepancy indicates RMT does not model zeta moments well enough to 
enable correct predictions of lower asymptotic terms. However, in numerical studies 
of zeta moments, it is useful to go beyond the leading coefficient, to have predictions 



for lower order coefficients in the moment polynomial Pfc(x) in conjecture (1.2|. 

This is one of the reasons the recent work of Conrey, Farmer, Keating, Rubin- 
stein, and Snaith [CFKRSlJ [CFKRS2] has been useful to our numerical investiga- 
tion. They gave a conjecture for Pfc(x) which agrees with the known cases k = 1 
and k = 2, as well as with the RMT prediction (1.1). Their conjecture is moreover 
supported by empirical data at low heights (near T — 2 x 10 6 ). Importantly, they 
computed the coefficients of Pk{x) for the first few integers k. This enabled us to 
calculate their predictions for moments beyond the fourth moment (2k = 4). 



3. Preliminaries 

We sample consecutive blocks of zeros B n . The blocks B n are of equal size, and 
are located in a neighborhood of the height T. We denote the size of a block B n 
by \B n \, and choose \B n \ = |-B ra +i| for all relevant n. We consider many ratios of 
the form 



m , I Bn IC(l/2 + i*)| 2fc rf* 

( } S Bn P k (^)dt ' 

where, by an abuse of notation, the symbolism J g is short for integrating over the 
interval spanned by block B n . So if a n and /3 n denote the ordinates of the first 
and last zeros in block B n , respectively, then f B := We expect the average of 
many ratios of the form (3.1 1 to approach 1. Notice if T is large, and the length of 
the interval spanned by block B n is small compared to T, then the denominator in 
ratio (3.1 ) is largely a function of T multiplied by the length of the interval spanned 
by B n . 

In our computations, we aggregated the moment data for each 1,000 consecutive 
zeros. (Thus blocks did not have the same lengths. However, since zeros are 
spaced quite regularly, the differences in lengths of blocks with the same number 
of zeros at the same height were minor.) The block sizes used in our computations 
range anywhere from 10 3 to 10 9 consecutive zeros (data for a block size of 10 4 , for 
instance, was obtained by averaging the aggregated data for 10 successive blocks 
of 1,000 zeros each); see Section 5 for details. Most of our computations were 
performed in the vicinity of the 10 23 -rd zero, which is near t = 1.3 x 10 22 . 

To investigate the behavior of the remainder term E^(t) in the full moment 
conjecture |CFKRSi] . we numerically examine the quantity 
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where a n and /3„ denote the ordinates of the first and last zeros of block B n , 
respectively. 

Table [2] contains the exact coordinates of the sets for which we computed mo- 
ments. Except for set S8, all these zeros were computed by the second author at 
AT&T Labs-Research in the late 1990s, in an extension of the earlier computations 
described in |01j . and will be documented in the book |HOj . Some output files 
from those computations were corrupted by buggy archiving software during their 
transfer from AT&T. Although we managed to clean up many of these files, some 
data was lost irretrievably. As a result, there were a few instances where we have 
missing blocks of anywhere between 1,000 to 4,000 consecutive zeros. In such cases, 
we skipped the missing blocks. 



TABLE 2. The datasets 



Set 


Approximate span 


First and last zeros, respectively 


A23 


5 billion zeros 


1.30664344087953251142539323425414 x 10 22 
1.30664344087959822199974045053551 x 10 22 


B23 


11 billion zeros 


1.30664344087935097997293481220857 x 10 22 
1.30664344087949333176034585636412 x 10 22 


O20 


1 billion zeros 


1.52024401160089830109496959179 x 10 19 
1.52024401161628795187223388010 x 10 19 


Z16 


1 billion zeros 


2.5132741232472002749333722 x 10 15 
2.5132743103949376298283407 x 10 15 


58 


100 million zeros 


14.1347251417347 
42653549.7609516 



Table [3] contains the moment values predicted by the leading term ^TTTj ) at T = 
10 22 . Predictions start to grow more slowly at the 14*' 1 moment and they completely 
collapse by the 28 th moment. But by the asymptotic relation (1.7), we have 



(3.3) ^ T |C(l/2 + rf)| 2 ^>l. 

for T > T , where T > is some absolute number. It follows by induction and 
Jensen's inequality that for k > 1 and T > Tq 



(3.4) 1 < I^ T |C(l/2 + ^)| 2fc d< < ^ T |C(l/2 + rf)| 2fc+2 ^. 

Therefore, the 2k th moments should increase with k for T > T . In view of Table § 
the height T must then be larger than 10 22 in order to reach the leading term 
asymptotic of certainly the 18 th moment, and most likely the 14 th and higher 
moments. This is the reason we chose to cut off our computation at the 12 th 
moment. 

Table [4] contains the full moment predictions ( |1.5| ) for the various sets listed in 
Table [2] where the coefficients of the polynomial Pfe(x) in (1.5) are as computed 
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TABLE 3. The expected moments as given by the right side of for T 

equal to the ordinate of the first zero in set A23. 



Moment 


Value 


2 


5.09 xlO 1 


4 


3.40 xlO 5 


6 


1.31 xlO 10 


8 


5.04 xlO 14 


10 


6.67 xlO 18 


12 


1.44 xlO 22 


14 


2.86 xlO 24 


16 


3.27 xlO 25 


18 


1.44 xlO 25 


20 


1.74 xlO 23 


22 


4.29 xlO 19 


24 


1.64 xlO 14 


26 


7.61 xlO 6 


28 


3.50 xlO" 3 



TABLE 4. The expected 2k th moment (truncated here after three digits) 
as given by integ ral l|1.5^, w here the coefficients of Pfc(x) are as computed 
in CFKRS1 and CFKRS2 . For each set, the integral is evaluated by setting 
T to the ordinate of the first zero, and T + H to the ordinate of the last zero. 
The first and last zeros of each set are specified in Table [2] 



Set 


2k = 2 


2k = 4 


2k = 6 


2k = 8 


2k = 10 


2k = 12 


A23 
B23 
O20 
Z16 
58 


5.02 x 10 1 
5.02 x 10 1 
4.34 x 10 1 
3.47 x 10 1 
1.58 x 10 1 


3.82 x 10 5 
3.82 x 10 5 
2.20 x 10 5 
9.41 x 10 4 
5.28 x 10 3 


3.30 x TO 10 
3.30 x 10 1() 
1.03 x 10 10 
1.77 x 10 9 
5.58 x 10 6 


1.04 x 10 16 
1.04 x 10 16 
1.54 x 10 15 
8.77 x 10 13 
9.87 x 10 9 


7.32 x 10 21 

7.32 x 10 21 
4.66 x 10 20 
7.63 x 10 18 

2.33 x 10 13 


8.89 x 10 27 
8.89 x 10 27 
2.26 x 10 26 
9.70 x 10 23 
6.67 x 10 16 



TABLE 5. The expected 2k th moment (truncated here after three digits) 
when the polynomial Pj, (x) in ( |1.5| is replaced by its leading term only, which 
has coefficient a(k)g(k)/k' 2 \. For each set, the integral is evaluated by setting 
T to the ordinate of the first zero, and T + H to the ordinate of the last zero. 
The first and last zeros of each set are specified in Table [2] 



Set 


2k = 2 


2k = 4 


2k = 6 


2k = 8 


2k = 10 


2k = 12 


A23 
P23 
O20 
Z16 


4.90 X 10 1 
4.90 x 10 1 
4.23 x 10 1 
3.36 x 10 1 


2.94 X 10 5 
2.94 X 10 5 
1.62 x 10 5 
6.47 x 10 4 


9.44 x 10 9 
9.44 x 10 9 
2.49 x 10 9 
3.13 x 10 s 


2.80 x 10 14 
2.80 x 10 14 
2.61 x 10 13 
6.57 x 10 11 


2.66 x 10 18 
2.66 x 10 18 
6.56 x 10 16 
2.07 x 10 14 


3.84 x 10 21 

3.84 x 10 21 

1.85 x 10 19 
4.66 x 10 15 



in |CFKRSl~j and [CFKRS2 . Table [5] contains the moment predictions when the 
polynomial Pk(x) in (1.5) is exchanged for its leading term only. 

The predictions (1.5| for the sets A23, B23, O20, Z16, and S8 are calculated 
by evaluating the integral (1.5) for the exact coordinates given in Table [2] Notice 
when T is large, predictions are insensitive to the precise choice of H as long as H 
is not too large, say H < T 1 / 2 . This will be the case for most samples from those 
sets. 

Finally, we point out the following notational convention in the next section. 
The heading of many of the tables in Section 4 will have the format 
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X samples Y 

This means the values listed in the table are based on X (usually consecutive) 
blocks, where each block consists of Y consecutive zeros. For simplicity, the block 
size Y is often written in abbreviated form; e.g. if Y is 1M, this means each block 
consists of a million consecutive zeros. Similarly, IB means each block consists of a 
billion consecutive zeros, and so on. We frequently refer to values of the moments 
over blocks simply as samples. 



4. Numerical results 

At the heights used in our study, there are substantial disparities between the 
full moment prediction ( 1.5 ) and the leading term prediction (obtained by replacing 
Pk{x) in (1.5) by its leading terms only). So, a relatively a good agreement with 
one conjecture implies lack of agreement with the other. For this reason, let us 
focus our attention on the full moment prediction (1.5), which is believed to be 
more accurate. 

Conrey, Farmer, Keating, Rubinstein, and Snaith [CFKRSl] (see also [CFKRS2 ) 
considered ratios of the form 



(41) / r |C(l/2 + if)| 2fc <ft 

I? * ' 

for T near 10 6 . They found a very good agreement between the data and the full 
moment predictions at that height. We remark the predictions for high moments 
near T = 10 6 , and also near T — 10 7 , 10 15 , 10 19 , 10 22 (heights used in our experi- 
ments), are not really determined by the leading term asymptotic, but by the lower 
order terms in the full moment conjecture. This is because lower order terms still 
contribute the most even for T as large as 10 22 . 

We first discuss the moment data at low heights near 10 7 . This is set S8 in 
Table [2j which consists of the first 10 s zeros. To aid in the analysis, let us define 
the following subsets: 



TABLE 6. Some subsets in 58. 



Set 


Initial zero 


Final zero 


Size of the set 


si 


10,000,000 


10,100,000 


10 5 zeros 


s2 


90,000,000 


90,100,000 


10 zeros 


s3 


10,000,000 


11,000,000 


10 6 zeros 


s4 


90,000,000 


91,000,000 


10 6 zeros 


s5 


10,000,000 


20,000,000 


10 7 zeros 


s6 


90,000,000 


99,980,000 


0.9998 X 10 7 zeros 



Table[7]lists the value of the ratio (3.1 ) for each of the subsets defined in Table|6] 
For example, the first entry in Table |7J which corresponds to 2k = 2 and subset si, 
was calculated using the formula: 



(4.2) 



j si ic(i/2 + ^)i 2 dt _ j;;;;^ 5 ic(i/2 + ^)i 2 dt 

I Sl PiH^)dt ~ /™+«* P X (log 



2n / 



dt 
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TABLE 7. Ratio of empirical moment to full moment prediction for the zero 
subsets defined in Table \E\ 



2k 


si 


s2 


s3 


s4 


s5 


s6 


2 


1.000 


1.001 


1.000 


1.000 


1.000 


1.000 


4 


0.996 


1.007 


1.000 


1.000 


1.001 


1.000 


6 


0.975 


0.999 


0.997 


0.996 


1.001 


1.000 


8 


0.943 


0.981 


0.992 


0.983 


1.001 


0.999 


10 


0.909 


0.962 


0.989 


0.962 


1.001 


1.000 


12 


0.875 


0.940 


0.987 


0.931 


1.000 


1.004 



Table [7] indicates a fairly good agreement with the full moment prediction (1.51 



However, even at these modest heights, we find evidence for substantial variability 
in the moment data. For instance, we can fi nd b locks of 10 5 zeros each such that 
the 12 th moment is half what is predicted by (1.5| (e.g. block [75xio 7 > 75xio 7 +io 5 D- 



Also, the variability in the moment data seems to increase with height. For example, 
when we sample 100 consecutive blocks of 10 5 zeros each near zero numbers 21 
million and 91 million, the standard deviation of the 12 th moment, divided by the 



full moment prediction (1.5), increases from about 0.994 to about 2.385, which 
seems substantial considering the statistics discussed here change on a logarithmic 
scale. 

We next consider the situation higher on the critical line. Table [8] contains 
statistical summaries for moment values from set Z16, which is a set of 10 9 zeros 
near the 10 16 -th zero. The first five columns of Table [8] were calculated as follows. 
We subdivided set Z16 into 10 6 blocks B ni where each block has 10 3 consecutive 
zeros. (As was mentioned earlier, we aggregated the moment data for each 1000 
consecutive zeros, so the smallest block size we can work with is 1000; see Section 
5 for details.) Next, for each k £ {1, . . . , 6}, we computed the quantities 

(4.3) [ " \((l/2 + it)\ 2k dt, ne[l,10 6 ], 

where j3 n and a n are the ordinates of the first and last zeros of block B n respectively 
(the blocks are chosen so the last zero of B n is the first zero of B n+1 ). Lastly, for 
each 1 < j < 1000 we calculated 

1 v^ilOOO j 
,n _ 1000 2^rt=l + (,;-l)1000 j » 

where T and T + H are chosen equal to the ordinates of the first and last zeros of 



the set Z16, which is where the blocks B n reside. Notice the denominator in (4.4) 
remains practically constant for H small compared to T, and so it is essentially the 
same as writing 

(4-5) / P k log — )dt , 

where ai+iuoofj-i) i s the ordinate of the first zero of -Bi + iooo(j-i)> an d /?ioooj is 
the ordinate of the last zero of -Bioooj- This procedure yields 1000 numerical values 
(i.e. sample points) {xj, 1 < j < 1000}, where each Xj now corresponds to a block 
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of 10 6 zeros. The first five columns of Table [8] contain the mean, minimum, and 
maximum of these values, as well as their standard deviation about the empirical 
mean. The other columns in the table were calculated similarly, but using larger 
block sizes (10 7 , and 10 8 , zeros, respectively). 

TABLE 8. Ratio of empirical moment to full moment prediction for samples 
from Zld. The full moment prediction is given in Table [4] The columns Min, 
Max, Mean, and SD refer to the min, max, mean, and the standard deviation 
of the ratios about their empirical mean. 



2k 


Mean 


Z16: 1000 samples 1M 


Z16: 100 samples 10M 


Z16: 10 samp 


es 100M 






Min 


Max 


SD 


Min 


Max 


SD 


Min 


Max 


SD 


2 


1.000 


0.987 


1.009 


0.003 


0.999 


1.002 


0.001 


1.000 


1.000 


0.000 


4 


1.000 


0.831 


1.602 


0.077 


0.969 


1.048 


0.017 


0.996 


1.006 


0.003 


6 


1.003 


0.494 


8.687 


0.530 


0.805 


1.796 


0.143 


0.966 


1.079 


0.035 


8 


1.019 


0.178 


37.68 


1.837 


0.532 


4.914 


0.555 


0.882 


1.262 


0.143 


10 


1.039 


0.046 


101.0 


4.202 


0.277 


11.57 


1.327 


0.709 


1.800 


0.354 


12 


1.035 


0.009 


190.4 


7.182 


0.116 


20.67 


2.301 


0.484 


2.553 


0.628 



In view of Table [HJ we see the standard deviation of samples generally declines 
like 1/ y/\B\, where \B\ is the block size. This indicates that the moments over 
such long blocks (millions of zeros each) act statistically independently. Also, the 
range of values spanned by high moments (i.e. the interval [Min, Max]) appears 
to decline linearly with \B\. This observation is likely due to the sparsity of blocks 
with large contributions. For example, if the block size is increased from 1M to 
10M, then since blocks with large contributions are rare, and since they usually do 
not occur near each other (at least as far as our data is concerned) , the range [Min, 
Max] should shrink to something like [Min, Max/10], as observed. 



Table m shows reasonable agreement with the full moment prediction (1.5) near 
T = 10 1 . At larger heights, the agreement is poorer. This is especially true for 
high moments, roughly the 8 th and higher moments. For instance, consider Table|9j 
which is based on a set of 1.5 x 10 10 zeros in the vicinity of the 10 23 -rd zero. There, 
the 12 th moment is about half what is expected. This growing disparity between 
prediction and evidence is likely due to a substantial extent to the fact that the 
integration interval is considerably shorter relative to the height around the 10 23 -rd 
zero. 

TABLE 9. Ratio of empirical moment to full moment prediction for samples 
from A23 U B23 (full moment prediction is given in Table [4|. The columns 
Min, Max, Mean, and SD refer to the min, max, mean, and the the standard 
deviation of the ratios about their empirical mean. 



2k 


Mean 


A23 U B23: 15,000 samples 1M 


A23 U B23: 1500 samples 10M 


Min 


Max 


SD 


Min 


Max 


SD 


2 


1.000 


0.978 


1.045 


0.007 


0.993 


1.008 


0.002 


4 


1.000 


0.642 


8.472 


0.270 


0.859 


1.895 


0.082 


6 


0.990 


0.172 


121.4 


2.289 


0.468 


13.38 


0.735 


8 


0.935 


0.021 


556.1 


8.179 


0.142 


56.51 


2.623 


10 


0.791 


0.001 


1184 


15.52 


0.025 


118.8 


4.945 


12 


0.574 


0.000 


1486 


18.27 


0.003 


148.7 


5.799 



Table [TU] provides a summary of all our moment data. It shows that at large 
heights (e.g. near T = 10 22 ), even a block size of 10 9 zeros is not enough to control 
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the variability in the moment data over adjacent blocks. An extreme example is 
that of the adjacent blocks &5 and 66, where the 12 th moment is multiplied by more 
than a factor of 16 from one block to the next. 

TABLE 10. Ratio: empirical moment /full moment prediction, for subsets 
from 58, Z16, O20, A23 and B23. Each subset consists of 10 9 consecutive 
zeros except for s8, which consists of about 4 X 10 7 zeros (specifically, zeros 
60M to 99.98M in 58). Subsets ax and bx are of increasing height, and are 
approximately consecutive except for some small gaps. The column T is the 
approximate height at which the subset starts, and column H is the approxi- 
mate length of the interval spanned by the subset. 



Sample 


2k = 2 


2k = 4 


2k = 6 


2k = 8 


2k = 10 


2k = 12 


T 


H 


s8 


1.000 


1.000 


1.000 


1.001 


1.002 


1.004 


10 7 


10 7 


zl6 


1.000 


1.000 


1.003 


1.019 


1.039 


1.035 


10 15 


10 8 


o20 


1.000 


1.000 


1.003 


0.982 


0.873 


0.667 


10 19 


10 s 


61 


1.000 


0.997 


0.941 


0.731 


0.419 


0.176 


10 22 


10 8 


62 


1.000 


1.006 


1.067 


1.183 


1.166 


0.908 


10 22 


10 8 


63 


1.000 


1.002 


0.989 


0.864 


0.587 


0.297 


10 22 


10 8 


64 


1.000 


0.994 


0.931 


0.740 


0.454 


0.208 


10 22 


10 8 


65 


1.000 


0.991 


0.895 


0.632 


0.324 


0.123 


10 22 


10 8 


66 


1.000 


1.008 


1.127 


1.543 


2.011 


2.011 


10 22 


10 8 


67 


1.000 


0.991 


0.932 


0.772 


0.517 


0.269 


10 22 


10 8 


68 


1.000 


0.999 


0.974 


0.836 


0.570 


0.303 


10 22 


10 8 


69 


1.000 


1.003 


0.980 


0.835 


0.568 


0.306 


10 22 


10 8 


610 


1.000 


0.988 


0.903 


0.685 


0.399 


0.174 


10 22 


10 8 


ol 


1.000 


0.995 


0.922 


0.684 


0.374 


0.151 


10 22 


10 8 


a2 


1.000 


1.002 


1.005 


0.976 


0.820 


0.542 


10 22 


10 8 


a3 


1.000 


1.005 


1.108 


1.449 


1.820 


1.823 


10 22 


10 8 


a4 


1.000 


1.006 


1.061 


1.185 


1.213 


1.003 


10 22 


10 8 


a5 


1.000 


1.007 


1.019 


0.911 


0.626 


0.321 


10 22 


10 8 



Since high moments are determined by a few samples with large contributions, 
it is useful to measure the impact of such samples more accurately. So, consider 
Table |11| which is a "moments of the moments" table calculated near zero number 
10 23 . 

TABLE 11. Moments of the ratios empirical moment/predicted moment af- 
ter being normalized to have mean and variance 1 (predicted moment is 
given in Table [4^ . The columns p = 3, p = 4, ...etc refer to the third moment, 
fourth moment,... etc, of the quantity empirical moment /predicted moment. 



2k 


A23 U B23: 150,000 samples 100,000 


p = 3 


p = 4 


p = 5 


p = 6 


p = 7 


p = 8 


2 


1.051 


7.251 


42.41 


411.7 


4664 


58750 


4 


20.61 


1042 


69580 


5135000 


396100000 


31280000000 


6 


83.86 


10960 


1599000 


243800000 


37990000000 


6000000000000 


8 


146.6 


26920 


5264000 


1059000000 


216300000000 


44580000000000 


10 


185.6 


39590 


8836000 


2014000000 


464100000000 


107700000000000 


12 


209.5 


48560 


11660000 


2845000000 


700300000000 


173400000000000 



In Table [TT} there is a notable slowdown in the growth rate with k of the "mo- 
ments of the moments." This slowdown is directly related to the frequency and pre- 
cise size of large block contributions. To get a better sense of the distribution of such 
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contributions, let us consider Tabic [T2| It lists the sum of the n largest block con- 
tributions to each moment for several n. To explain how the table was constructed, 
take the entries corresponding to 2k = 2 and n = 1, 2, . . . , 5 for example. They were 
calculated by first computing {xj, 1 < j < 1.5 X 10 6 }, where each Xj is a ratio of the 
form (empirical second moment /predicted second moment) obtained using a block 
of 10 4 zeros from A23U B23. The sequence {xj, 1 < j < 1.5 x 10 6 } was then sorted 
in descending order. That resulted in another sequence {yj,l < j < 1.5 x 10 6 }; 
so, ?/i corresponds to the largest contribution among all blocks. The entries corre- 
sponding to 2k = 2 were then calculated as 

/ ■— i Vi 

(4.6) 100 x i , n= 1,2,3,4,5. 

Ej=i i/i 



TABLE 12. For each k, samples of 10,000 zeros each are sorted according 
to their contribution to the 2k th moment in decreasing order. For each k, the 
column lists the percentage contribution of the first n sorted samples to the 
sum of all samples (1.5 million samples in total). 



n 


A23 U B23: 1.5M samples 10,000 


2k = 2 


2k = 4 


2k = 6 


2k = 8 


2k = 10 


2k = 12 


1 


0.0 


0.1 


0.8 


4.0 


10.0 


17.2 


2 


0.0 


0.1 


1.6 


7.6 


18.9 


32.4 


3 


0.0 


0.1 


2.1 


9.9 


24.2 


40.6 


4 


0.0 


0.2 


2.6 


11.8 


28.0 


46.1 


5 


0.0 


0.2 


3.0 


13.4 


31.4 


50.8 



From Table 12 we see that near zero number 10 23 , more than 50% of the value of 
the 12 t/l moment is determined by the 5 largest contributing samples out of a total 
of 1.5 million samples. In comparison, the contribution of the analogous samples 
in the case of the 2 nd moment is negligible. This statistic illustrates the difficulty 
of sufficiently controlling high moments in experiments. 

To further understand extreme values of the moments, we consider the function 



(4.7) f(n):=f k (n) = ^, n>l, 

Vi 

which records the n th largest block contribution as a fraction of the largest block 
contribution, where each block spans 10 4 zeros. It is worth mentioning that the 
largest contribution to the 12 th moment among such blocks found in our computa- 
tions is 148,569 times the expectation according to Table [4] Figure [T] is a graph of 
fk(n) for 1 < n < 50, and 2k — 4, 8, or 12. The figure is based on the full set of 15 
billion zeros near T = 10 22 (sets A23 and £23). 

Apart from a small initial segment, the lines in Figure [T] parallel rather closely 
the power law p(n) := Pk(n) — n~ k / 5 . Moreover, we obtain a similar graph if we 
instead use the set of 10 zeros near T = 10 15 available to us (i.e. set ZIQ). Figure[l] 
would arise if the n th largest value of |£(l/2 + it)| in the interval of integration were 
about n -1 / 10 times the largest value. The log- normal distribution law suggests that 
asymptotically this should hold with the 1/10 replaced by 
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(AX) 1 l0 S lQ g T 

where M is the number of blocks. 



FIGURE 1 . Largest block contributions sorted in descending order and nor- 
malized according to | |4.7| using a sample of 1.5 million blocks from near 
T = 10 22 (each block is 10 4 zeros). 
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As evidenced by the data so far, the empirical moments vary substantially, even 
when calculated using long blocks of zeros. So, it might be interesting to consider 

( L |C(i/2 + zi)| 2fe dA 
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In other words, we consider log (empirical moment /predicted moment), which is 
equal to 



(4.10) 



log 1 



Ek(/3 n ) ~ E k (a n ) 



where a n and j3 n denote the ordinates of the first and last zeros of block B n respec- 
tively. The distribution of this quantity could shed some light on the remainder 
term in the full moment conjecture. Table [13] contains statistical summaries for 
(4.10) using several zero sets. The numbers do not vary much from sample to 



sample. In particular, the various summary statistics change relatively little across 
the lower tables, which come from approximately the same height. Also, Table |l4| 
suggests if we sample (4.10) over many blocks B ni then normalize the samples to 



have mean and variance 1, the moments of the normalized samples generally start 
decreasing when 2k > 6. It is not clear why there is such a trend, or whether it 
will persist. 

TABLE 13. log(empirical moment /predicted moment), where predicted mo- 
ment is given in Table [4] The columns Min, Max, Mean, and SD refer to the 
min log ratio, max log ratio, mean log ratio, and the standard deviation of the 
logs of ratios about their empirical mean. The set from B23 corresponds to b2 
in Table [Tol 



2k 


Z16: 10,000 samples 100,000 


O20: 10,000 samples 100,0000 


Min 


Max 


Mean 


SD 


Min 


Max 


Mean 


SD 


2 


-0.048 


0.069 


0.000 


0.013 


-0.066 


0.155 


0.000 


0.021 


4 


-0.598 


2.054 


-0.031 


0.230 


-0.757 


2.938 


-0.076 


0.345 


6 


-1.871 


4.400 


-0.326 


0.671 


-2.468 


5.298 


-0.657 


0.886 


8 


-3.644 


5.925 


-1.046 


1.126 


-4.943 


6.599 


-1.899 


1.411 


10 


-5.749 


6.917 


-2.134 


1.560 


-7.931 


7.232 


-3.666 


1.907 


12 


-8.142 


7.552 


-3.508 


1.976 


-11.30 


7.410 


-5.826 


2.385 



2k 


£23: 10,000 samples 100,000 


A23, £23: 150,000 samples 100,000 


Min 


Max 


Mean 


SD 


Min 


Max 


Mean 


SD 


2 


-0.083 


0.291 


0.000 


0.028 


-0.092 


0.360 


0.000 


0.028 


4 


-0.949 


3.938 


-0.121 


0.423 


-1.067 


4.343 


-0.120 


0.418 


6 


-3.005 


6.506 


-0.937 


1.016 


-3.258 


7.098 


-0.933 


1.008 


8 


-5.977 


7.847 


-2.580 


1.578 


-6.335 


8.623 


-2.573 


1.568 


10 


-9.586 


8.419 


-4.860 


2.109 


-10.030 


9.379 


-4.849 


2.096 


12 


-13.82 


8.462 


-7.613 


2.623 


-14.330 


9.606 


-7.597 


2.607 



Finally, we consider the correlations of the 2k moment. To quantify such 



correlations, we computed the autocovariances c„ 
defined as 



of the 2k th moment. These are 



(4.11) 
where 



1 R 



(4.12) 



R 

/ \((l/2 + it)\ 2k dt, x=-Y^x r , 

J B r K r=i 
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TABLE 14. Moments of log(empirical moment/predicted moment) normal- 
ized to have mean and variance 1 (predicted moment given in Table The 
columns p = 3, p = 4, ...etc refer to the third moment, fourth moment, ...etc, 
of the quantity log(empirical moment/ predicted moment). 



2k 


Z16: 10,000 samples 100,000 


p=3 


p=4 


p=5 


p=6 


p=7 


p=8 


2 


0.271 


3.338 


3.490 


22.64 


51.80 


264.8 


4 


1.553 


8.237 


39.06 


238.3 


1581 


11380 


6 


1.306 


6.020 


21.40 


99.97 


489.3 


2610 


8 


1.083 


4.916 


14.52 


60.60 


253.2 


1172 


10 


0.954 


4.424 


11.68 


46.68 


178.9 


774.5 


12 


0.875 


4.170 


10.23 


40.21 


146.1 


611.1 



2k 


A23 U B23: 150,000 samples 100,000 


p = 3 


p = 4 


p = 5 


p = 6 


p = 7 


p = 8 


2 


0.862 


5.855 


25.53 


200.2 


1783 


18240 


4 


1.737 


8.773 


42.40 


255.8 


1709 


12560 


6 


1.286 


5.747 


19.83 


89.95 


430.8 


2270 


8 


1.075 


4.826 


14.21 


59.10 


248.7 


1167 


10 


0.966 


4.445 


11.99 


48.34 


190.6 


850.4 


12 


0.902 


4.251 


10.84 


43.22 


164.0 


712.7 



We used a tota l of 10 6 consecutive blocks B r consisting of 10 3 zeros each (so in 
definition (4.11), we have R = 10 6 ). Figure [2] contains graphs of c m /co for m = 
1,2,..., 40, and 2k = 2, near heights T = 10 7 (set s8), T = 10 15 (set zl6), T = 10 19 
(set o20), and T = 10 22 (sets b23-6 and b23-10). 

Figure [2] has several interesting features. Although the autocovariances are over- 
whelmingly positive at low heights (set s8), they become mostly negative at large 
heights (e.g. set b23-10). Additionally, the amount of variation in the autocovari- 
ances c m is substantially more for the first few m at all heights considered. 

We plotted c m /co for two different sets near T — 10 22 (sets b23-6 and b23- 
10). The two plots look almost identical, which suggests the autocovariances are 
significant. To better quantify this, we plotted the autocovariances of set b23-10 
with the blocks randomly permuted (set "b23-10 randomized"). Quite visibly, the 
autocovariances of the randomized set are much smaller than those of the original 
ordered set. This points to some long range correlations in the values of the second 
moment. 

As for higher moments, the autocovariances do not appear to be significant. 
When 2k = 4, they are hardly distinguishable from those of a randomized set, and 
when 2k = 6, 8, 10, 12, they become completely indistinguishable. 

The apparent significance of the autocovariances in the case of the second mo- 
ment prompted us to consider the following shifted fourth moment: 



(4.13) 



M(T, H; a) 



1 

H 



T+H 



|C(l/2 + it)\ 2 |C(l/2 + it + ia)\ 2 dt . 



If the values of |C(l/2 + it)\ and |C(l/2 + it + ia)\ are statistically independent of 



each other, which is plausible for large a, then one expects the integral in (4.13 1 to 
split, so it is approximated by 
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Figure 2. Correlations of the second moment. 



s8 



30 



z16 




o20 
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(4.14) 



T+H 



M(T, H- a) =— / |C(l/2 + it)\ z \((l/2 + it + ia)\ z dt 

ti Jt 



H 



T+H 



\((l/2 + it)\ 2 dt 



H 



T+H 



\({l/2 + it + ia)\ 2 dt) , 



and the latter is directly related to the autocovariances. Kosters [Ko and Chandee |Ch| 
investigated the function M(T, T; a) as well as other more general shifted moments. 
Kosters' work immediately implies if alogT = 0(1) (as T — > oo), and if 



(4.15) 
then 



K(T s = ^2 / _ 4sin 2 (q logT/2) \ 

1 ' ' ' (alogT) 2 v (alogT) 2 J 
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Figure 3. Shifted fourth moment M(T,H;a)/M(T,H;0), black 
dots (draw n for a a multiple of 0.03), versus the kernel K(T;a) 
defined in U.lty, dashed line. T « 10 22 , H w 6.5 x 10 5 . 
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(4.16) 



M(T, T; a) 



K(T; a) 



M(T,T;0) 

where right side is defined for a = by taking a limit. It is plausible that relation 



(4.16) would continue to hold if the left side is replaced by 



M(T, H; a) _ fi +H |C(l/2 + it)\ 2 |C(l/2 + it + ia)\ 2 dt 
M(T,H;0) ~ 



(4.17) 



| T T+H |C(l/2 



, , sl _ , it)\ 4 dt 

when H is much smaller than T but not too small. This is supported to some 
extent by Fi gure [3] since it shows a reasonable agreement between ( 4.17[ ) and the 
kernel (|4.15|) for < a < 1.5, T 10 22 , and H ks 6.5 x 10 5 . However, for 



a > 0.4, Figure [3] points to deviations from the kernel model. And Figure [4] shows 
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that, starting around a = 1.5, there is a clear departure from the kernel model, 
and the onset of asymptotic oscillations (of amplitude < 0.018). Furthermore, the 
oscillations remain significant for large a (large relative to logT); for example, we 
computed M(T,H; 100) = 0.011447 and M(T,H; 1000) = 0.004035. Presumably, 
these long-range oscillations are induced by prime sums present in the lower order 
terms of relation (4.16). 



Figure 4. Shifted fourth moment M(T,H;a)/M(T,H;0), black 
dots (draw n for a a multiple of 0.5), versus the kernel K(T;a) 
defined in (|4.15b, dashed line. T « 10 22 , H ss 6.5 x 10 5 . 
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5. Numerical methods 
We calculated the moments of |C(l/2 + it)\ by evaluating integrals similar to 



(1.4) near T = 10 and at lower heights. The computational method involved two 



main choices, a method to approximate point-wise values of |C(l/2 + it)\ for large 



t, and a suitable integration technique to handle integrals like (1.4). 



5.1. Point-wise approximations. One method to numerically evaluate C(l/2 + it) 
is based on Euler-Maclaurin summation. This method is derived from the summa- 
tion by parts formula, and it can evaluate the zeta function on the critical line 
with great accuracy. But it requires f 1 +°^( 1 ) elementary operations on numbers of 
C\(log£) bits to compute C(l/2 + it) to within ±t~ K for a single value of t even 
if we do not demand high accuracy. This is prohibitive when t is near 10 22 , as in 
our experiments. Note that asymptotic constants are taken as t — ¥ oo, and the 
notations o K () and O k () mean asymptotic constants depend on the parameter k 
only. 

A much more efficient method to numerically approximate C(l/2 + it) is the 
Riemann-Siegel formula, which was invented by Riemann and rediscovered by Siegel 
during the latter's study of Riemann's notes. The Riemann-Siegel formula can be 
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used to compute C(l/2 + it) to within ±t~ K for large t using i 1 / 2 +°«( 1 ) operations 
on numbers of K (\ogt) bits. 

The running times stated above are for computing single values of the zeta func- 
tion. For computations that involve many evaluations of C(l/2 + it) in a restricted 
interval, the Odlyzko-Schonhage [OS algorithm (OS) offers significant savings. 

Theorem 5.1 (The Odlyzko-Schonhage algorithm (OS)). Given c > 0, e > 0, and 
a G [0,1/2] there exists an effectively computable c\ — Cx(c,a, e) > so that one 
can compute Z(t) for any value oft>0 in the range T < t < T + T a with accuracy 
T~ c using < cxt £ operations on numbers of < c\ logi bits provided a precomputation 
involving < ciT 1 / 2 " 1 " 6 operations and < c\T a+<L bits of storage is done beforehand. 



For example, choosing a = 1/2 in Theorem 5.1 one can compute Z(t) at n rs 
T 1 / 2 points in the range (T, T+-\/T) using n 1+€ operations, which is nearly optimal. 
By comparison, the Riemann-Siegel formula requires n 2+£ operations to achieve the 
same task, and the Euler-Maclaurin formula requires n 3+e operations. 

Our approach to computing values of the zeta function was to use the results of 
the precomputations of the OS algorithm that had been carried out by the second 
author at AT&T Labs - Research. 

Although computations using the OS method, as well as the other methods 
mentioned previously, can be made completely rigorous, provided one uses multi- 
precision arithmetic, this is almost never done in practice because of the time cost. 
Instead, the validity of the final results depends on the assumption that there is 
quasi random cancellation of round-off errors, plus some additional checks. For a 
discussion, see [HQl IUI1I02] . 

In addition to OS, we used another method to compute \£(l/2+it)\ that appears 
to be valid over short ranges and is numerically analyzed in detail is Section 6. To 
describe briefly, though, suppose in some zero interval I n := (7n,7n+i) one knows 
the value of |C(l/2 + i(7n + 7n+i)/2) as well as the location of the m zeros below 
and m zeros above j n . Then one can try to approximate |C(l/2 + it)\ for t G /„ 
by a polynomial that assumes the value |C(l/2 + i((7 n +7„+i)/2) at (7„+7„ + i)/2 
and vanishes at those 2m neighboring zeros. Let us denote this approximation by 
HP since it is essentially a truncated Hadamard product. Numerical experiments, 
which are discussed in Section 5, suggest the quality of HP improves linearly with 
m, which is the number of zeros used (the improvement is in the sense of L°°-error). 

In general, HP approximations are not very accurate. For example, with m — 2 9 
and for t w 10 22 , HP has an L°° error of about 6 x 10~ 2 . On the other hand, HP 
is faster than OS. If used judiciously (see Section 5.3 for a detailed explanation), it 
can reduce the running time of our computations while not affecting the expected 
accuracy of the moment data. 

5.2. Integration Methods. Our goal is to calculate the 2nd to 12th even mo- 
ments of |C(l/2 + zt)|P] To control for oscillations, each integration interval has two 
consecutive zeros of Z(t) as its endpoints; i.e., is of the form /„ = (7„,7„ + i). 

We experimented with several integration methods. We settled on Romberg 
integration as our choice, due to its simplicity, fast convergence, and ability to 
naturally provide a posteriori error estimates. 



2 Our computations were slightly more comprehensive because they included the —0.5 to 12 
moments in increments of 1/2. But only the even moments are discussed in this paper. 
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5.3. Numerical Implementation. We first discuss implementation details in the 
vicinity of zero number 10 23 (i.e. sets A23 and B23). Numerical tests on overlap- 
ping intervals showed the distribution of the L°°-error of the OS approximation 
of |C(l/2 + it)\ is normally distributed with mean w and standard deviation 
~ 5 x 10~ 9 , so the OS approximation is typically accurate to within ±5 x 10~ 9 . 
In order to determine the integration intervals, we used the zero sets previously 
compiled by the second author. Numerical tests on overlapping zero sets showed 
the L°°-error in the location of zeros hovered around 5 x 10 -8 . 

Our initial goal was to obtain enough accuracy to enable comparisons between 



the empirical moments and prediction (1.1 1. For each zero interval, we attempted 
to compute the 2k th moment so that 

Absolute posteriori error <10 — 3 X Average zero gap at height T 

(5.1) , , 

X Expected 2k moment according to ■ 

Note the average zero gap near height T — 10 22 is about 0.128. We aggregated 
the moment data for each 1,000 zeros. Thus, we expect the aggregate error in our 
computed values of the 2k th moment divided by the length of integration interval 
to satisfy 

(5.2) Aggregate posteriori error < 10 — 3 X Expected 2k th moment according to . 



In particular, upon dividing the empirical moment by prediction ( 1.1 1, the ratio will 



typically have 3 significant decimal digits. For the values of k we are considering 



though, the leading term predictions (1.1) are less in size than the full moment 



predictions (1.5). Since the latter are conjectured to be more accurate, then upon 



dividing empirical moments by the full moment prediction (1.5), the ratio should 
have more digits of accuracy (especially for high moments). That is, we expect the 
ratio (empirical moment ( 1.4 ) /predicted moment ( |1.5[ )) to be correct to within 



o Expected 2k moment according to 
(5.3) ± 1(T 3 x 1 



1.1 



1.5 



Expected 2k th moment according to 

Once we decided on this accuracy standard, the following were chosen accord- 
ingly: the number of zeros to use in HP, and a criterion to determine whether HP or 
OS is more appropriate. After some numerical tests, which are described in Section 

4, we decided to fix the number of zeros supplied to HP at 1000 zeros, or 500 zeros 
on each side of each interval. With this choice, HP is about 5 times faster than 

05, it has an L°°-error of about 6 x 10~ 2 and a typical error (i.e. square root of 
L 2 -error) of about 10~ 4 . 

Given the unimodal shape of |£(l/2 + zf)| on each integration interval /„ = 
(7n,7n+i), it is reasonable to try to approximate max t g/ n |C(l/2 + H)\ by C := 
|C(l/2 + ito)\, where to is the midpoint of /„. So, a rough upper bound for the 
contribution of I n to the 2k th moment is C 2k . In particular, if C is small enough, 
then HP can be safely used; i.e. without affecting the accuracy standard. Numerical 
experiments suggested that given our choices so far, we could set the C-threshold 
for using HP at C < 7.0. Note that the contribution of such intervals is, in any 
case, mostly negligible for the 4 th and higher moments at all heights considered. 

To test the accuracy and efficiency of our implementation, we set it to work in 
regions where |C(l/2 + it)\ assumes large values. Based on this, we chose an upper 
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bound of 2048 iterations for Romberg integration. The integration programs used 
double arithmetic. This allows for about 15 to 16 significant digits, which is more 
than sufficient for our purposes. 

In sum, with our choices of the parameters, the program used HP to integrate 
on about 88% of the intervals, the posteriori error typically surpassed the accuracy 
standard, and the program consumed an average of 12 OS function evaluations per 
interval For lower sets 020 and Z16, the parameter choices were decided similarly. 
As for set 58, which consists of the first 10 8 zeta zeros, we relied exclusively on the 
Riemann-Siegel formula to compute the moments. Lastly, as a check of the validity 
of the integration programs, we successfully reproduced moment data computed by 
Michael Rubinstein for t m 10 6 . We implemented the code in FORTRAN 90, and 
ran it onSGI machines at the Minnesota Supercomputing Institute. 

6. Numerical tests of local models of the zeta function 

In addition to HP, which is the polynomial approximation discussed in Section 
5, there is another attractive formula to approximate £(l/2 + it) over a zero interval 
In = {jn,Jn+i) which is due to Gonck, Hughes, and Keating |GHK| . The method 
expresses C(l/2 + it) as the product of two parts: one that resembles a truncated 
Euler product, and another that resembles a truncated Hadamard product. For this 
reason, we will abbreviate the [GHKJ formula as EHP (Eulcr-Hadamard product). 
The approximation EHP does not require the value of C(l/2 + it) at the midpoint 
of the zero interval I n . Instead, it incorporates the contribution of the primes in 
a natural way. The approximation arises from a smoothed approximate functional 
equation for the logarithmic derivative of zeta. 

EHP requires the Euler and Hadamard products to be both suitably smoothed. 
We implemented smoothing for the Euler product, but not for the Hadamard prod- 
uct. It was pointed out to the authors, however, that it is difficult to eliminate 
smoothing from the Hadamard product in EHP with realistic error bounds [H] , 
This suggests smoothing the Hadamard product is critical to the accuracy. Never- 
theless, implementing such smoothing would cause EHP to be too slow in practice. 
Thus, in considering whether to use EHP in our moment calculations, we only 
tested it with an unsmoothcd Hadamard product, which is also how it was briefly 
tested in |GHK| . We found the accuracy of EHP to be comparable to HP, but the 
latter was significantly faster due to its simplicity (the convergence of HP is linear 
in the number of zeros used). 

6.1. Definitions. Gonek et al. have obtained the following formula for the Rie- 
mann zeta function: 



(6.1) 




where s = a + it, a > 0, |t| > 2, X > 2, K any fixed positive integer, and 



(6.2) 




'We counted an HP evaluation as 1/5 OS evaluation. 
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(6.3) 



Z x (s) =exp(-Y,U((s- P ) log xf) 



where u(x) is a nonnegative C°° function of mass 1 with support on [e 1 1 ' x ,e], 
v(t) = J t u(x)dx, U(s) = J u{x)E\(s log x) dx, p denotes a nontrivial zeros of 
the zeta function, and E\(z) — e~ w /wdw is the exponential integral. The 
O-notation constants depend on u and K only. 

A literal implementation of (|6.1|) is not efficient because it is expensive to com- 



pute U(s). For this reason, we use an unsmoothed, truncated version of (6.3 1; in 
particular, we replace u(x) in the definition of U(s) by a delta function at e. We 
assume the Riemann hypothesis, and for t € (7n,7n+i) we define 



(n+m 
- Y, £i(i(*-7i)logX)) 
j=n— m+1 

As for Px(l/2 + it), we did apply smoothing to it because smoothing there is not 
computationally expensive. Specifically, we first define the following non-negative 
C°° function: 



/(«)H e T '>', M- f{xW - x) 



x < 
then define the smoothing kernel 



fZf(x)fO.-x)dx' 



(6.5) u(x) = Xg(X log(x/e) + l)/x. 

Figur[5]is a plot of u(x) when A = 6. Put together then, for t € (7„, 7 n +i) w e have 



FIGURE 5. The kernel u{x) with X = 6 

14 A 
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|C(l/2 + it) | =: |Px(l/2 + it)Z% m (l/2 +it)\ + R hm (t) 



(6.6) 



exp 



E 



A(n) cos(tlogn) 



log n/ log x 



n+m 



xexpf ^ Ci(|t-7j|logX)j +Ri, m (t), 

j=n — m+1 

where Ci(x) — f^°cos(t)/tdt is the cosine integral, and i?i im (t) is the remainder 
function. We remark the numerics showed little sensitivity to the exact choice of 



u(x) (which determines v(x) in (6.6) 



Let us denote the approximation |Py(l/ 2 + it)Z x ' m (l/2 + it)\ in (6.6) by EHP 



since it is an Euler-Hadamard product. Note the approximation depends on two 
parameters: X and m, which control the number of primes and zeros used in Px 
and Z x ' m respectively. 

In addition to EHP, we considered the following approximation of |C(l/2 + it)\: 

let 



n+m 



<r m (t)= n it- 7i i, 

j=n— m+1 

then, for t € (in^Tn+i)? we define 



(6.7) |C(l/2 +it)\ = Q n ' m (t) Ki Q n 2 ^f + R*,m(t) > 

where rj n = (7^+1 + j n )/2, and i?2,m(i) is the remainder function. We denote the 
approximation Q n ' m (t) ^B^/j$ by HP as it is a truncated Euler product. We 
expect it to be a good approximation of |£(l/2 + zf)| because locally, away from the 
pole at 1, the zeta function may be treated as a polynomial with the non-trivial 
zeros as its roots. 



6.2. The function \Px(l/2 + it)\. Formula (6.1) incorporates arithmetic contri- 
butions to the zeta function via Px- We tested whether Px can be expected to 
have a tangible impact on numerical results by measuring its variation about its 
mean. Figure [6] is a plot of |Px(l/2 + iT)\ near 



(6.8) T = 1.306643440879589721233593307594 x 10 22 , 

which is in the vicinity of the 10 23 -rd zero. It shows |Px(l/2 + it)\ varies substan- 
tially. Furthermore, we calculated the variance of |Px(l/2 + it)\ in an interval of 
the form (T, T + H); that is, 

(6-9) h 

\\P x (l/2 + i(T + t))\-Px\ 2 dt, Px:=±[ \P x (l/2 + i(T + t))\dt. 

tl Jo ti J 
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FIGURE 6. \P X (1/2 + i(T + t))\ for X = 6 (solid), 50.92 (short dashes), and 
1000 (long dashes). The interval (T, T + 5) covers about 40 zeros. 

2 °| l\ 




1 2 3 4 5 



We chose H — 128, which approximately the length of the stretch covered by 1,000 
zeros near height T. Table [T5| lists the variance for several values of X. It shows in 
particular both the mean and the variance of Px increase with X. 

TABLE 15. Variation of |Px(l/2 + i(T + f)| for t £ (0, 128). 



Value of X 6 50.92 1000 
Mean 1.36 1.67 1.93 

Variance 1.33 4.88 9.68 



6.3. Approximations of |C(l/2 + it)\. Our experiments relied on a set of 30,000 
consecutive zeros near zero number 10 23 . We denote this set by W. The first zero in 
set W is at the height T specified in ( |6.8[ ) . This is the first zero above Gram point 
number 10 23 + 18, 767, 166, 306. Figures [7] and [8] are plots of the approximation 
EHP, which was defined in (6.6), for several choices of X and m together with a 
plot of |((l/2 + it) | (calculated using the OS algorithm). The figures show that 
agreement is fair given the basic nature of the input supplied to EHP. 



FIGURE 7. |C(l/2 + i(T + t))\ (solid) and EHP (dashed) with X = 6 and 
m = 25 over the interval covering zeros numbered 865 to 885 in the set W. 
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FIGURE 8. |C(l/2+i(T+t))| (solid) and EHP (dashed) with X = 50.92 and 
m = 16 over the interval covering zeros numbered 865 to 885 in the set W . 

120 




1.39 2.77 

6.4. Numerical experiments. For each interval (7„,7„ + i) in the set W, we cal- 
culated the values of |£(l/2 + it)| at l(n) equally spaced points t n ,d inside (j n , 7 n +i), 
where l(n) = 2[10 7 " H Z ^~ 7 " + lj + 1, and A„ is the average spacing of the zeros near 
zero ordinate j n . Notice l(n) is always odd, so |£(l/2 + *(7n+i + 7n)/2)|, which 
is the value at the midpoint, is always included. The values of C(l/2 + it) were 
computed using the OS algorithm. We carried out 29 experiments, each involving 
1,000 consecutive zeros; that is, in experiment v, we used either HP or EHP to 
approximate |C(l/2 + it)\ over the interval 



(6.10) [r v -i,r v ), r v = 7jv+50o+iooo«, v = 1, . . . ,29, 

where N w 10 23 is the number of the first zero in set W . In each experiment, we 
let m (in Z 7 ^™ 1 and Q n ' m ) take on the values 2 r for r = 1, . . . , 8, and we let X take 
on the values 2, 6, 50.92, 1000, 2000, and 4000. Lastly, we let Z*(l/2 + it) denote 
one of the approximations HP or EHP, and define the L°°-error in experiment v by 



(6.11) max \((l/2 + it jtd )- Z*(l/2 + it j4 )\. 

d=l,...,l(j) 
j=r„_i,...,r„-l 

Figure|9]is a scatter plot of the L°°-errors obtained with m = 64 and X = 1000. The 
line y — x (dashed) is included to aid in comparison. The plot suggests that with 
the current choices of parameters X and m, HP is generally a better approximation 
than EHP. 

We are also interested in understanding the effect of modifying the parameter X, 
which essentially controls the importance of the arithmetic part Px relative to the 
polynomial part Zx- Specifically, as X decreases, Px becomes less important and 
we approach the HP model. To this end, consider Table 16| 4 which indicates the 



I/°°-errors reach a minimum when X is in an intermediary region (X w 1000). But 
this minimum is 2.82, which is significantly larger than that of HP whose average 
is 0.13 with m = 256. Also, the standard deviation of the errors is roughly 1 for 
the values of X listed in Table [TBI whereas it is 0.07 for HP. 



4 For X = 50.92, 1000 and 2000, we averaged the L°° error with m = 256 using the results 
from all of the 29 experiments. However, for X = 4000, the average is based on data obtained 
from 15 experiments only. 
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FIGURE 9. The L°° error of HP (horizontal axes) vs. EHP (vertical axes) 
with m = 256 and X = 1000 for all 29 experiments. The dashed line is y = x. 



0.1 0.2 0.3 0.4 0.5 

TABLE 16. The average I/°°-error of EHP with m = 256 over all 29 experiments. 

Value of X 6 50.92 1000 2000 4000 

Average L°°-error 6.84 3.81 2.82 3.11 3.00 



In a sense, the kind of comparison we have carried out so far may be ill-defined, 
because by construction HP is exact at (j n + j n+ i)/2, whereas EHP is not nec- 
essarily so. If EHP is normalized so that it is exact at the midpoint, the results 
improve significantly. Let us call this approximation "normalized EHP." Tabic [T7| 
contains the average L°°-errors of that approximation over all experiments. The 
L°°-errors for normalized EHP appear to reach a local minimum when X ss 6 (i.e. 
the arithmetic part Px uses only the primes 2, 3, and 5). Also, table 18 contains 
the history of convergence of the L°°-errors as m (which determines the number of 
zeros used) increases. 



TABLE 17. The average L°°-error of normalized EHP with m 
all 29 experiments. 



256 over 



Value of X 
Average L ' 



2 

0.52 



2.71828 
0.16 



2.9 
0.30 



6 
0.11 



50.92 1000 
0.50 1.03 



TABLE 18. History of Convergence of the L°°-error. Results based on ex- 
periment 1 (i.e. zeros numbered 500 to 1500 in set W). 



m 


HP 


normalized EHP with X = 50 


normalized EHP with X = 6 


2 


16.0327267 


10.3965279 


13.2324066 


4 


8.6654952 


7.3793465 


4.9913900 


8 


4.9345427 


3.3031051 


1.9168742 


16 


2.0833875 


3.8302768 


2.7077135 


32 


1.0230760 


2.1464398 


1.2456626 


64 


0.5268745 


1.6312318 


0.5550487 


128 


0.2601180 


0.9890934 


0.3347868 


256 


0.1308390 


0.4652148 


0.0544018 



Finally, we considered the convergence rates of normalized EHP for various values 
of X. The convergence rate is defined by 
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.12) 

C r - 



\ogE r - log£: r _i 



where E r is the average L°° -error with m = 2 r 



log(l/2) ' 

Table [19] indicates the convergence rate of normalized EHP is not smooth. Even 
orders of convergence fluctuate considerably, and sometimes are negative. On the 
other hand, HP converges more smoothly at a linear rate. Figure 10 is a visual 
representation of this. We note the amount of time required in the EHP model was 
significantly more than in the HP model. Most of the additional time went into 
evaluating the cosine Integral . 

TABLE 19. Order of Convergence of the L°°-error. Results based on exper- 
iment 1. 



Cv 


HP 


normalized EHP with X = 50 


normalized EHP with X = 6 


c 2 


0.888 


0.495 


1.407 


c 3 


0.812 


1.160 


1.381 


c 4 


1.244 


-0.214 


-0.498 


c 5 


1.026 


0.836 


1.120 


c 6 


0.957 


0.396 


1.166 


cv 


1.018 


0.722 


0.729 


c 8 


0.991 


1.088 


2.622 



FIGURE 10. The vertical axes is the log of the average L°°-errors over all 29 
experiments of HP (solid), EHP with X = 1000 (short dashes), and normalized 
EHP with X = 6 (long dashes). The horizontal axes is log mj log 2, where 
m = 2, 4, 8, . . . , 256. The convergence rate is —slope X t 2 
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